Initial Setup

The data I am using in this assignment is the gapminder dataset.

# Load the packages needed
# install.packages("prettydoc")
suppressPackageStartupMessages(library(prettydoc))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(gapminder))
suppressPackageStartupMessages(library(forcats))
suppressPackageStartupMessages(library(ggthemes))
suppressPackageStartupMessages(library(kableExtra))
suppressPackageStartupMessages(library(gridExtra))
suppressPackageStartupMessages(library(grid))
suppressPackageStartupMessages(library(scales))
suppressPackageStartupMessages(library(plotly))
# Source: https://github.com/dgrtwo/gganimate
# install.packages("cowplot")  # a gganimate dependency
# devtools::install_github("dgrtwo/gganimate")
# suppressPackageStartupMessages(library(gganimate))

Part 1: Factor Management

Drop Oceania

Task Description: Filter the Gapminder data to remove observations associated with the continent of Oceania. Additionally, remove unused factor levels. Provide concrete information on the data before and after removing these rows and Oceania; address the number of rows and the levels of the affected factors.

Before making any changes to the dataset, let us review the dimension and structure of the original dataset. Note that gapminder dataset has 1704 rows and 6 columns, among which the continent Oceania has 24 observations.

# Review on the dimension and structure of the original dataset
dim(gapminder)
## [1] 1704    6
# check the continent counts
continent_tbl <- as.data.frame(table(gapminder$continent))

# make the table of the continent counts  
continent_tbl%>%
  kable("html", caption = "Continent Counts",col.names = c("Continent", "Counts")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),full_width = F)%>%
  column_spec(1, bold = T, border_right = T) %>%
  column_spec(2, width = "20em")
Continent Counts
Continent Counts
Africa 624
Americas 300
Asia 396
Europe 360
Oceania 24

After checking out the original dataset, I first filtered the dataset to remove the observations associated with the continent of Oceania. The function levels() provides access to the levels attribute of a variable. Using this function, we can see that the level “Oceania” is still in the attribute of the variable, which means, only by removing the observations the level is not dropped from the original dataset.

The function droplevels() is used to drop unused levels from a factor or, more commonly, from factors in a data frame. Here I use this function to drop the level “Oceania” which is no longer in the filtered dataset.

# Filter the data to remove the observations from Oceania
new_dat <- gapminder %>%
  filter(continent!="Oceania")

# access the levels attribute of the variable continent
levels(new_dat$continent)
## [1] "Africa"   "Americas" "Asia"     "Europe"   "Oceania"
# manually drop the level
new_dat_drop <- new_dat %>%
  droplevels()

# access the levels attribute of the variable continent after dropping unused levels
levels(new_dat_drop$continent)
## [1] "Africa"   "Americas" "Asia"     "Europe"

From the table of continent counts for the two cases we can see that, just by removing the observations associated with one level, R will make the observation counts for the corresponding level zero but keep the level unchanged.

Continent Counts after removing the observations from Oceania
Continent Counts
Africa 624
Americas 300
Asia 396
Europe 360
Oceania 0
Continent Counts after dropping the Oceania level
Continent Counts
Africa 624
Americas 300
Asia 396
Europe 360

Reorder the levels of country or continent.

Use the forcats package to change the order of the factor levels, based on a principled summary of one of the quantitative variables. Consider experimenting with a summary statistic beyond the most basic choice of the median.

I will reorder the levels of country by the maximum gdpPercap of each country.

# reorder the levels of country by the maximum `gdpPercap` of each country. 
country_reorder <- gapminder$country %>%
  fct_reorder(gapminder$gdpPercap, max)

# levels after reordering
head(levels(country_reorder))
## [1] "Burundi"    "Ethiopia"   "Malawi"     "Zimbabwe"   "Liberia"   
## [6] "Mozambique"
# comparing with levels of original dataset
head(levels(gapminder$country))
## [1] "Afghanistan" "Albania"     "Algeria"     "Angola"      "Argentina"  
## [6] "Australia"

Explore the effects of arrange()

In order to explore the effects of arrange, first I will create a smaller subset of the gapminder dataset. Then I use fct_reorder and arrange respectively to manipulate the data. The resulting tables below clearly show the difference between the effects of arrange() and fct_reorder: reordering will not change the order of observations(since in gapminder, the data are already sorted within each level), while arrange() will sort all the observations based on our specification.

# Get observations from continent `Europe` and randomly select 5 countries
sub_dat <- gapminder %>% 
  filter(continent == "Europe")

# randomly sample 5 countries from the continent `Europe`
set.seed(0)
spl_id <- sample(unique(sub_dat$country), 5)
sub_dat <- sub_dat %>% filter(country %in% spl_id)

# reordering by gdpPercap
reorder_dat <- sub_dat %>%
  mutate(country = fct_reorder(country, gdpPercap, .desc = TRUE))

# arranging by gdpPercap
arrange_dat <- sub_dat %>%
  group_by(country) %>%
  arrange(gdpPercap)
# arrange(desc(gdpPercap))
# make the table of original subdata
sub_dat %>%
  kable("html", caption = "Table of the newly created sub-dataset",
        col.names = c("Country", "Continent", "Year", "Life Expectancy", "Population", "GDP per capita")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),full_width = F)%>%
  column_spec(1, width = "10em", border_right = T) %>%
  column_spec(2, width = "10em") %>%
  scroll_box(width = "900px", height = "400px")
Table of the newly created sub-dataset
Country Continent Year Life Expectancy Population GDP per capita
Denmark Europe 1952 70.780 4334000 9692.385
Denmark Europe 1957 71.810 4487831 11099.659
Denmark Europe 1962 72.350 4646899 13583.314
Denmark Europe 1967 72.960 4838800 15937.211
Denmark Europe 1972 73.470 4991596 18866.207
Denmark Europe 1977 74.690 5088419 20422.901
Denmark Europe 1982 74.630 5117810 21688.040
Denmark Europe 1987 74.800 5127024 25116.176
Denmark Europe 1992 75.330 5171393 26406.740
Denmark Europe 1997 76.110 5283663 29804.346
Denmark Europe 2002 77.180 5374693 32166.500
Denmark Europe 2007 78.332 5468120 35278.419
Germany Europe 1952 67.500 69145952 7144.114
Germany Europe 1957 69.100 71019069 10187.827
Germany Europe 1962 70.300 73739117 12902.463
Germany Europe 1967 70.800 76368453 14745.626
Germany Europe 1972 71.000 78717088 18016.180
Germany Europe 1977 72.500 78160773 20512.921
Germany Europe 1982 73.800 78335266 22031.533
Germany Europe 1987 74.847 77718298 24639.186
Germany Europe 1992 76.070 80597764 26505.303
Germany Europe 1997 77.340 82011073 27788.884
Germany Europe 2002 78.670 82350671 30035.802
Germany Europe 2007 79.406 82400996 32170.374
Italy Europe 1952 65.940 47666000 4931.404
Italy Europe 1957 67.810 49182000 6248.656
Italy Europe 1962 69.240 50843200 8243.582
Italy Europe 1967 71.060 52667100 10022.401
Italy Europe 1972 72.190 54365564 12269.274
Italy Europe 1977 73.480 56059245 14255.985
Italy Europe 1982 74.980 56535636 16537.483
Italy Europe 1987 76.420 56729703 19207.235
Italy Europe 1992 77.440 56840847 22013.645
Italy Europe 1997 78.820 57479469 24675.024
Italy Europe 2002 80.240 57926999 27968.098
Italy Europe 2007 80.546 58147733 28569.720
Slovak Republic Europe 1952 64.360 3558137 5074.659
Slovak Republic Europe 1957 67.450 3844277 6093.263
Slovak Republic Europe 1962 70.330 4237384 7481.108
Slovak Republic Europe 1967 70.980 4442238 8412.902
Slovak Republic Europe 1972 70.350 4593433 9674.168
Slovak Republic Europe 1977 70.450 4827803 10922.664
Slovak Republic Europe 1982 70.800 5048043 11348.546
Slovak Republic Europe 1987 71.080 5199318 12037.268
Slovak Republic Europe 1992 71.380 5302888 9498.468
Slovak Republic Europe 1997 72.710 5383010 12126.231
Slovak Republic Europe 2002 73.800 5410052 13638.778
Slovak Republic Europe 2007 74.663 5447502 18678.314
Sweden Europe 1952 71.860 7124673 8527.845
Sweden Europe 1957 72.490 7363802 9911.878
Sweden Europe 1962 73.370 7561588 12329.442
Sweden Europe 1967 74.160 7867931 15258.297
Sweden Europe 1972 74.720 8122293 17832.025
Sweden Europe 1977 75.440 8251648 18855.725
Sweden Europe 1982 76.420 8325260 20667.381
Sweden Europe 1987 77.190 8421403 23586.929
Sweden Europe 1992 78.160 8718867 23880.017
Sweden Europe 1997 79.390 8897619 25266.595
Sweden Europe 2002 80.040 8954175 29341.631
Sweden Europe 2007 80.884 9031088 33859.748
# make the result table of reordering
reorder_dat %>%
  kable("html", caption = "Result Table after reordering by  `gdpPercap`",
        col.names = c("Country", "Continent", "Year", "Life Expectancy", "Population", "GDP per capita")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),full_width = F)%>%
  column_spec(1, width = "10em", border_right = T) %>%
  column_spec(2, width = "10em") %>%
  scroll_box(width = "900px", height = "400px")
Result Table after reordering by gdpPercap
Country Continent Year Life Expectancy Population GDP per capita
Denmark Europe 1952 70.780 4334000 9692.385
Denmark Europe 1957 71.810 4487831 11099.659
Denmark Europe 1962 72.350 4646899 13583.314
Denmark Europe 1967 72.960 4838800 15937.211
Denmark Europe 1972 73.470 4991596 18866.207
Denmark Europe 1977 74.690 5088419 20422.901
Denmark Europe 1982 74.630 5117810 21688.040
Denmark Europe 1987 74.800 5127024 25116.176
Denmark Europe 1992 75.330 5171393 26406.740
Denmark Europe 1997 76.110 5283663 29804.346
Denmark Europe 2002 77.180 5374693 32166.500
Denmark Europe 2007 78.332 5468120 35278.419
Germany Europe 1952 67.500 69145952 7144.114
Germany Europe 1957 69.100 71019069 10187.827
Germany Europe 1962 70.300 73739117 12902.463
Germany Europe 1967 70.800 76368453 14745.626
Germany Europe 1972 71.000 78717088 18016.180
Germany Europe 1977 72.500 78160773 20512.921
Germany Europe 1982 73.800 78335266 22031.533
Germany Europe 1987 74.847 77718298 24639.186
Germany Europe 1992 76.070 80597764 26505.303
Germany Europe 1997 77.340 82011073 27788.884
Germany Europe 2002 78.670 82350671 30035.802
Germany Europe 2007 79.406 82400996 32170.374
Italy Europe 1952 65.940 47666000 4931.404
Italy Europe 1957 67.810 49182000 6248.656
Italy Europe 1962 69.240 50843200 8243.582
Italy Europe 1967 71.060 52667100 10022.401
Italy Europe 1972 72.190 54365564 12269.274
Italy Europe 1977 73.480 56059245 14255.985
Italy Europe 1982 74.980 56535636 16537.483
Italy Europe 1987 76.420 56729703 19207.235
Italy Europe 1992 77.440 56840847 22013.645
Italy Europe 1997 78.820 57479469 24675.024
Italy Europe 2002 80.240 57926999 27968.098
Italy Europe 2007 80.546 58147733 28569.720
Slovak Republic Europe 1952 64.360 3558137 5074.659
Slovak Republic Europe 1957 67.450 3844277 6093.263
Slovak Republic Europe 1962 70.330 4237384 7481.108
Slovak Republic Europe 1967 70.980 4442238 8412.902
Slovak Republic Europe 1972 70.350 4593433 9674.168
Slovak Republic Europe 1977 70.450 4827803 10922.664
Slovak Republic Europe 1982 70.800 5048043 11348.546
Slovak Republic Europe 1987 71.080 5199318 12037.268
Slovak Republic Europe 1992 71.380 5302888 9498.468
Slovak Republic Europe 1997 72.710 5383010 12126.231
Slovak Republic Europe 2002 73.800 5410052 13638.778
Slovak Republic Europe 2007 74.663 5447502 18678.314
Sweden Europe 1952 71.860 7124673 8527.845
Sweden Europe 1957 72.490 7363802 9911.878
Sweden Europe 1962 73.370 7561588 12329.442
Sweden Europe 1967 74.160 7867931 15258.297
Sweden Europe 1972 74.720 8122293 17832.025
Sweden Europe 1977 75.440 8251648 18855.725
Sweden Europe 1982 76.420 8325260 20667.381
Sweden Europe 1987 77.190 8421403 23586.929
Sweden Europe 1992 78.160 8718867 23880.017
Sweden Europe 1997 79.390 8897619 25266.595
Sweden Europe 2002 80.040 8954175 29341.631
Sweden Europe 2007 80.884 9031088 33859.748
# make the result table of arranging
arrange_dat %>%
  kable("html", caption = "Result Table after arranging by  `gdpPercap`",
        col.names = c("Country", "Continent", "Year", "Life Expectancy", "Population", "GDP per capita")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),full_width = F)%>%
  column_spec(1, width = "10em", border_right = T) %>%
  column_spec(2, width = "10em")%>%
  scroll_box(width = "900px", height = "400px")
Result Table after arranging by gdpPercap
Country Continent Year Life Expectancy Population GDP per capita
Italy Europe 1952 65.940 47666000 4931.404
Slovak Republic Europe 1952 64.360 3558137 5074.659
Slovak Republic Europe 1957 67.450 3844277 6093.263
Italy Europe 1957 67.810 49182000 6248.656
Germany Europe 1952 67.500 69145952 7144.114
Slovak Republic Europe 1962 70.330 4237384 7481.108
Italy Europe 1962 69.240 50843200 8243.582
Slovak Republic Europe 1967 70.980 4442238 8412.902
Sweden Europe 1952 71.860 7124673 8527.845
Slovak Republic Europe 1992 71.380 5302888 9498.468
Slovak Republic Europe 1972 70.350 4593433 9674.168
Denmark Europe 1952 70.780 4334000 9692.385
Sweden Europe 1957 72.490 7363802 9911.878
Italy Europe 1967 71.060 52667100 10022.401
Germany Europe 1957 69.100 71019069 10187.827
Slovak Republic Europe 1977 70.450 4827803 10922.664
Denmark Europe 1957 71.810 4487831 11099.659
Slovak Republic Europe 1982 70.800 5048043 11348.546
Slovak Republic Europe 1987 71.080 5199318 12037.268
Slovak Republic Europe 1997 72.710 5383010 12126.231
Italy Europe 1972 72.190 54365564 12269.274
Sweden Europe 1962 73.370 7561588 12329.442
Germany Europe 1962 70.300 73739117 12902.463
Denmark Europe 1962 72.350 4646899 13583.314
Slovak Republic Europe 2002 73.800 5410052 13638.778
Italy Europe 1977 73.480 56059245 14255.985
Germany Europe 1967 70.800 76368453 14745.626
Sweden Europe 1967 74.160 7867931 15258.297
Denmark Europe 1967 72.960 4838800 15937.211
Italy Europe 1982 74.980 56535636 16537.483
Sweden Europe 1972 74.720 8122293 17832.025
Germany Europe 1972 71.000 78717088 18016.180
Slovak Republic Europe 2007 74.663 5447502 18678.314
Sweden Europe 1977 75.440 8251648 18855.725
Denmark Europe 1972 73.470 4991596 18866.207
Italy Europe 1987 76.420 56729703 19207.235
Denmark Europe 1977 74.690 5088419 20422.901
Germany Europe 1977 72.500 78160773 20512.921
Sweden Europe 1982 76.420 8325260 20667.381
Denmark Europe 1982 74.630 5117810 21688.040
Italy Europe 1992 77.440 56840847 22013.645
Germany Europe 1982 73.800 78335266 22031.533
Sweden Europe 1987 77.190 8421403 23586.929
Sweden Europe 1992 78.160 8718867 23880.017
Germany Europe 1987 74.847 77718298 24639.186
Italy Europe 1997 78.820 57479469 24675.024
Denmark Europe 1987 74.800 5127024 25116.176
Sweden Europe 1997 79.390 8897619 25266.595
Denmark Europe 1992 75.330 5171393 26406.740
Germany Europe 1992 76.070 80597764 26505.303
Germany Europe 1997 77.340 82011073 27788.884
Italy Europe 2002 80.240 57926999 27968.098
Italy Europe 2007 80.546 58147733 28569.720
Sweden Europe 2002 80.040 8954175 29341.631
Denmark Europe 1997 76.110 5283663 29804.346
Germany Europe 2002 78.670 82350671 30035.802
Denmark Europe 2002 77.180 5374693 32166.500
Germany Europe 2007 79.406 82400996 32170.374
Sweden Europe 2007 80.884 9031088 33859.748
Denmark Europe 2007 78.332 5468120 35278.419

After evaluating the difference between the two functions, I then plot the three resulting dataset to see how the effects of functions are reflected on the data visualization.

# plot the three cases and make them side by side 
plot1 <- sub_dat %>% 
  ggplot(aes(year, gdpPercap, colour = country))+ 
  geom_point()+
  geom_line()+
  theme_bw()+
  ggtitle("Plot for gdpPercap per year \n - original")

plot2 = reorder_dat %>% 
  ggplot(aes(year, gdpPercap, colour = country))+ 
  geom_point()+
  geom_line()+ 
  theme_bw()+
  ggtitle("Plot for gdpPercap per year \n - reordering")

plot3 = arrange_dat %>% 
  ggplot(aes(year, gdpPercap, colour = country))+ 
  geom_point()+
  geom_line()+ 
  theme_bw()+
  ggtitle("Plot for gdpPercap per year \n - arranging")

grid.arrange(plot1,plot2,plot3,ncol = 3)

From the plots we can see that, even though the arrange seems to change the orders on the observations, it does not have effects on the plot. This can be seen by that the color for each country in the plot is the same as the plot of original sub-dataset. In the contrast, the colors for countries in the plot of reordering change which means reorder can affect the plot result. (Indeed now the colors are in the descending order in the middle plot.)

Part 2: File I/O

Task Description: Experiment with write_csv()/ read_csv() , saveRDS()/ readRDS(). Create something new, probably by filtering or grouped-summarization of Singer or Gapminder. Fiddle with the factor levels, i.e. make them non-alphabetical. Explore whether this survives the round trip of writing to file then reading back in.

For this part, I will first reorder the levels similarly as first part but by the maximum population (descending) for the countries in Europe. Note that after the reordering, the levels for the country are no longer listed alphabetically.

# get the observations in Europe
gap_Europe <- gapminder %>%
  filter(continent == "Europe") 

# reorder the newly created data
gap_Europe_reorder<- gap_Europe %>%
  mutate(country = fct_reorder(country, pop, max, .desc = TRUE))

# first a few levels after reordering
head(levels(gap_Europe_reorder$country))
## [1] "Germany"        "Turkey"         "France"         "United Kingdom"
## [5] "Italy"          "Spain"
# comparing to the levels before reordering
head(levels(gap_Europe$country))
## [1] "Afghanistan" "Albania"     "Algeria"     "Angola"      "Argentina"  
## [6] "Australia"

Then I will write the dataset into a csv file using write_csv() and read it using read_csv(). I will also experiment with the saveRDS() and readRDS() similarly to see the difference between these two sets of writing and reading functions. The table given for the original dataset and import datasets are the same, however, after using write_csv()/ read_csv(), the country now becomes a character instead of factor. In order to check the levels, I use as.factor() function to transfer country into a factor, however, after this step, the imported csv file will not retain the reordered country levels in the original dataset. In the meanwhile, saveRDS()/ readRDS() will keep the attribute of the variables thus keeping this reordered country levels.

# write the reordered dataset into csv
write_csv(gap_Europe_reorder, "gap_Europe_reorder.csv")

# write the reordered dataset into  rds
saveRDS(gap_Europe_reorder, "gap_Europe_reorder.rds")

# read the newly created csv file
import_csv = read_csv("gap_Europe_reorder.csv")

# read the newly created rds file
import_rds = readRDS("gap_Europe_reorder.rds")

# make tables for the original dataset and import datasets
head(gap_Europe_reorder)%>%
  kable("html", caption = "First Parts of the Reordered Observations in original `gap_Europe_reorder`",col.names = c("Country", "Continent", "Year", "Life Expectancy", "Population", "GDP per capita")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),full_width = F)%>%
  column_spec(1, bold = T, border_right = T) %>%
  column_spec(2, width = "10em")
First Parts of the Reordered Observations in original gap_Europe_reorder
Country Continent Year Life Expectancy Population GDP per capita
Albania Europe 1952 55.23 1282697 1601.056
Albania Europe 1957 59.28 1476505 1942.284
Albania Europe 1962 64.82 1728137 2312.889
Albania Europe 1967 66.22 1984060 2760.197
Albania Europe 1972 67.69 2263554 3313.422
Albania Europe 1977 68.93 2509048 3533.004
head(import_csv) %>%
   kable("html", caption = "First Parts of the Reordered Observations by reading `gap_Europe_reorder.csv` ",col.names = c("Country", "Continent", "Year", "Life Expectancy", "Population", "GDP per capita")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),full_width = F)%>%
  column_spec(1, bold = T, border_right = T) %>%
  column_spec(2, width = "10em")
First Parts of the Reordered Observations by reading gap_Europe_reorder.csv
Country Continent Year Life Expectancy Population GDP per capita
Albania Europe 1952 55.23 1282697 1601.056
Albania Europe 1957 59.28 1476505 1942.284
Albania Europe 1962 64.82 1728137 2312.889
Albania Europe 1967 66.22 1984060 2760.197
Albania Europe 1972 67.69 2263554 3313.422
Albania Europe 1977 68.93 2509048 3533.004
head(import_rds) %>%
   kable("html", caption = "First Parts of the Reordered Observations by reading `gap_Europe_reorder.rds` ",col.names = c("Country", "Continent", "Year", "Life Expectancy", "Population", "GDP per capita")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),full_width = F)%>%
  column_spec(1, bold = T, border_right = T) %>%
  column_spec(2, width = "10em")
First Parts of the Reordered Observations by reading gap_Europe_reorder.rds
Country Continent Year Life Expectancy Population GDP per capita
Albania Europe 1952 55.23 1282697 1601.056
Albania Europe 1957 59.28 1476505 1942.284
Albania Europe 1962 64.82 1728137 2312.889
Albania Europe 1967 66.22 1984060 2760.197
Albania Europe 1972 67.69 2263554 3313.422
Albania Europe 1977 68.93 2509048 3533.004
# Check the levels for original dataset
head(levels(gap_Europe_reorder$country))
## [1] "Germany"        "Turkey"         "France"         "United Kingdom"
## [5] "Italy"          "Spain"
# check variable `country`'s attribute
class(import_csv$country)
## [1] "character"
# Check the levels for import csv file
head(levels(as.factor(import_csv$country)))
## [1] "Albania"                "Austria"               
## [3] "Belgium"                "Bosnia and Herzegovina"
## [5] "Bulgaria"               "Croatia"
# check the levels for import rds file
head(levels(import_rds$country))
## [1] "Germany"        "Turkey"         "France"         "United Kingdom"
## [5] "Italy"          "Spain"

Part 3: Visualization Design

Task Description: Remake at least one figure or create a new one, in light of something you learned in the recent class meetings about visualization design and color. Reflect on the differences of your first attempt and what you obtained after some time spent working on it. If using Gapminder, you can use the country or continent color scheme that ships with Gapminder. Then, make a new graph by converting this visual to a plotly graph. What are some things that plotly makes possible, that are not possible with a regular ggplot2 graph?

# original plot from homework2
gapminder %>% ggplot(aes(gdpPercap, lifeExp)) + scale_x_log10()+
  geom_point() + 
  geom_smooth() + 
  facet_wrap(~continent, ncol=3)

For this part, first I will start by cleaning the a plot from homework2 and try out the theming. First I add labels and title using the labs() function, then I added the black and white theme using theme_bw(), then I adjust the theme using theme() function to adjust the text size and color in the axis as well as change the color of background. Now the plot has more information based on the continent color and the dollar sign in the x axis can reflect more information of the dataset.

# changing the look of the graphic using theme() layer
(plot5 <- gapminder %>% ggplot(aes(gdpPercap, lifeExp)) + 
  scale_x_log10(labels= dollar_format())+
  geom_point(alpha = 0.3, aes(color = continent))+ 
  geom_smooth() + 
  facet_wrap(~continent, ncol=3)+
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "Plot for GDP per capita versus life expectancy in the five continents" )+
  theme_bw()+
  theme(axis.text = element_text(size = 8),
        strip.background = element_rect(fill = "green4"),
        strip.text = element_text(color = "white")))

I will create another plot and make use of the what I have learned in the recent class meetings about visualization design and color to make it more effective.

(plot4 <-
  gapminder %>% 
  # get only the countries of interest
  filter(country %in% c("Thailand", "Vietnam"))%>%
  ggplot(aes(gdpPercap, lifeExp, shape = country, color = pop))+
  # scale the gdpPercap
  scale_x_log10(labels = dollar_format())+
  geom_point(aes(size = pop))+ 
  scale_size_area()+
  scale_color_gradient(low = "#0091ff", high = "#f0650e")+
  # add labels and title
  labs(x = "GDP per capita",
       y = "Life Expectancy",
       title = "Plot for GDP per capita versus life expectancy of Thailand and Vietnam" )+
  # add theme
  theme_bw())

Next, I will convert the plot4 (GDP per capita versus life expectancy of Thailand and Vietnam) and plot5 (GDP per capita versus life expectancy in the five continents) into plotly.

In general, plotly provides us a toolbar to interact with the plot. We could zoom in and out and even directly download the plot. Moreover, by hovering close to the data point, plotly plot will automatically show detailed information of this datapoint. From the plot by converting plot5 I found a very useful function plotly provides, which is that if you click on the continent legend on the right, plotly will remove all the points on the corresponding plot. This is useful if you want to take a closer look at the smooth line as well as if you have overlapping plots. In light of this finding, I found that the plotly has the highlighting function, which is very useful for interacting with the plot as well as in making the animation.

In terms of animation in R, I also found a package called gganimate, I did try out a few example plots but I found it slower than making animation using plotly. gganimate is useful for making gif while I think plotly is more powerful as it provide a way for users to interact with the plot, which is extremely useful when we want to make higher dimensional plots.

# convert plot 4 and 5 into plotly
plotly::ggplotly(plot4)
plotly::ggplotly(plot5)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
g <- crosstalk::SharedData$new(gapminder, ~continent)
plot6 <- ggplot(g, aes(gdpPercap, lifeExp, color = continent, frame = year)) +
  geom_point(aes(size = pop, ids = country)) +
  geom_smooth(se = FALSE, method = "lm") +
  scale_x_log10(labels= dollar_format())+
  theme_bw()+
  labs(title = "Plot for GDP per capita versus life expectancy \n for different continents",
       x = "GDP per capita",
       y = "Life Expectancy")
## Warning: Ignoring unknown aesthetics: ids
plotly::ggplotly(plot6) %>% 
  plotly::highlight("plotly_hover")

Part 4: Writing figures to file

Task Description: Use ggsave() to explicitly save a plot to file and reload it.

I will export the plot4 (GDP per capita versus life expectancy of Thailand and Vietnam) and plot5 (GDP per capita versus life expectancy in the five continents) using ggsave() to png formatted figure.

# ggsave without specifying width and height.
ggsave("./gdpPercap_vs_lifeExp_Thai_Viet.png", plot = plot4)
## Saving 7 x 5 in image
# ggsave with specifying width and height
ggsave("./gdpPercap_vs_lifeExp_Thai_Viet_wh.png", plot = plot4, width = 8, height = 8)

# ggsave with scale
ggsave("./gdpPercap_vs_lifeExp_Thai_Viet_sc.png", plot = plot4, width = 3, height = 3, scale = 2)

Now reload the data to see the result:

If I display a former plot, say plot1, if we don’t specify the plot name in ggsave(), I found that ggsave() will save the plot we just displayed.

plot1

ggsave("./test_plot.png")
## Saving 7 x 5 in image

“But I want to do more!” - Make a deeper exploration of the forcats packages

Reference and Source

  1. Sequential, diverging and qualitative colour scales from colorbrewer.org

    https://ggplot2.tidyverse.org/reference/scale_brewer.html

  2. Top 50 ggplot2 Visualizations - The Master List (With Full R Code)

    http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html

  3. gganimate vs. plotly - Which is better at animation?

    https://www.brucemeng.ca/post/animations-in-r/

  4. Intro to Animations in R

    https://plot.ly/r/animations/